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Summary 

Aggregation  of  information  for  decision  making  is  frequently  used  in  many  disciplines.  In 
this  research  effort,  we  investigated  two  new  methodologies  based  on  fuzzy  set  theory  to  achieve 
information  fusion  in  computer  vision  systems.  The  tirst  scheme  may  be  described  as  a  hierarchical 
fuzzy-connective-based  aggregation  network.  The  second  scheme  is  based  on  the  generalization  of 
the  fuzzy  integral. 

The  proposed  fuzzy-connective-based  aggregation  network  uses  degrees  of  satisfaction 
(memberships)  of  various  decision  criteria  and  aggregates  the  memberships  in  a  hierarchical 
network.  The  nature  and  the  parameters  of  the  aggregation  connectives  are  learnt  through  training 
procedures.  We  also  presented  techniques  to  determine  the  structure  of  such  networks  when  this 
structure  is  only  approximately  known.  These  techniques  also  provide  a  mechanism  for  selecting 
powerful  features  and  discarding  irrelevant  features  via  the  detection  of  redundancies.  Another 
attractive  feature  of  the  proposed  approach  is  that  the  networks  that  result  after  training  can  be 
interpreted  as  a  set  of  "rules"  that  may  be  used  for  decision  making.  This  approach  was  tested  on  a 
variety  of  computer  vision  and  automatic  target  recognition  problems. 

Robust  membership  generation  methods  are  of  crucial  importance  if  the  proposed  methods 
are  to  v/ork  effectively.  We  investigated  several  membership  generation  techniques  such  as 
histogram-based  methods  and  cluster-based  methods.  Although  several  clustering  algorithms  have 
been  proposed  in  the  literature,  one  major  drawback  of  such  clustering  algorithms  is  that  there  is  no 
simple  way  to  determine  the  number  of  clusters  that  best  describes  the  sample  data.  We  devised  a 
compatible  cluster  merging  technique  which  determines  the  optimum  number  of  subspace  clusters 
in  an  efficient  way,  when  an  upper  bound  on  the  number  of  clusters  present  in  the  image  or  feature 
space  is  known.  Our  method  can  also  be  used  to  obtain  straight-line  descriptions  of  edge  images 
and  planar  approximations  of  3-D  (range)  data. 
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The  second  methodology  that  we  proposed  for  information  fusion  was  based  on  the  fuzzy 
integral.  The  Fuzzy  Integral  is  a  flexible,  nonlinear  information  fusion  methodology  which 
combines  objective  evidence  (membership  values)  with  expected  values  of  subsets  of  the  sources 
(as  embodied  in  a  fuzzy  measure).  We  have  extended  the  theoretical  utility  of  the  Sugeno  fiizzy 
integral  for  information  fusion.  There  were  two  extensions  to  this  basic  approach.  First,  we 
replaced  the  Sugeno  measures  by  a  more  general  class  of  fuzzy  measures,  the  so-called  S- 
decomposable  measures.  The  second  extension  to  the  standard  fuzzy  integral  involves  the 
definition  of  the  fuzzy  integral  itself.  This  is  achieved  by  changing  the  max  operation  to  one  of  a 
selected  class  of  co-t-norms,  or  by  replacing  the  min  operation  by  one  of  a  class  of  t-norms.  These 
can  either  be  more  or  less  optimistic  than  the  original  fuzzy  integral,  giving  the  designer  more 
flexibility  in  the  design  of  an  algorithm  (or  suite  of  algorithms)  for  a  particular  problem. 

Since  the  process  of  fuzzy  inference  concerns  the  fusion  of  evidence  -  how  much  does  the 
input  possibility  match  that  of  the  antecedent  -  it  is  only  natural  that  flexible  and  powerful 
aggregation  network  structures  be  utilized  in  the  inference  mechanism.  We  generalized  fuzzy 
inference  mechanism  so  that  a  parametrically  defined  family  of  aggregation  operators  are  used. 
This  makes  it  possible  for  a  learning  algorithm  to  be  implemented  which  allowed  the  networks  to 
outdo  the  theoretically  predicted  performance. 

Finally,  we  also  investigated  the  use  of  morphological  methods  for  fusing  edge  information 
from  intensity  and  range  images.  Two  new  edge  detection  and  classification  schemes  for  range 
images  based  on  morphological  operations  were  developed.  Several  experiments  involving 
intensity  and  range  image  pairs  were  conducted. 
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1.  Introduction 


In  complex  conq>uter  vision  systems,  several  sensors  (such  as  multi-spectral  color  sensors, 
range  sensors,  and  stereo  views),  and  several  algorithms  are  commonly  employed  to  reduce  the 
uncertainty  and  to  resolve  the  ambiguity  present  in  the  decisions  derived  from  a  single  sensor  or  a 
single  algorithm.  The  advantages  of  multi-sensor  fusion  lie  in  redundancy,  compkmentarity, 
timeliness  and  cost  of  the  information.  Thus,  there  is  a  need  for  methodologies  that  can  aggregate 
inexact  and  incomplete  infexmation  obtained  from  multiple  sources  in  order  to  make  decisions.  In 
this  research  effort,  we  addressed  several  aspects  of  the  information  fusion  problem  using  fuzzy- 
set-theoretic  approaches.  A  number  of  new  theoretical  results  and  algorithms  have  emerged  from 
this  research.  The  publications  resalting  from  this  research  include  five  journal  papers,  13 
conference  papers,  two  Ph.  D.  theses  and  two  master’s  theses.  In  the  following  pages,  we  provide 
brief  topic-wise  descriptions  of  the  majex  achievements  of  this  research  effort  More  details  may  be 
found  in  the  publications  listed  at  the  end  of  the  document.  The  major  topics  addressed  under  this 
grant  are: 

i.  Investigation  of  fuzzy-set-theoretic  operators  for  information  fusion 

it.  Investigation  of  training  methods  to  determine  nature  and  structure  of  aggregation  networks 

iii.  Elimination  of  redundant  features/criteria  and  interpretation  of  network 

iv.  Methods  for  calculating  memberships  based  on  observed  feature  values 

v.  Experiments  with  aggregation  networks 

vi.  Theoretical  investigation  on  the  generalization  of  the  fuzzy  integral 

vii.  Methods  for  calculating  densities  based  on  observed  feature  values 

viii.  Experiments  with  the  fuzzy  integral 

ix.  Evidence  aggregation  networks  for  fiizzy  logic  inference 

X.  Range  and  intensity  information  fusion  using  morphological  methods 
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2.  Investigation  of  Fuzzy-Set-Theoretic  Operators  for  Information  Fusion 

In  multi-criteria  decision  making,  the  requirements  imposed  by  the  decision-making 
process  may  be  of  several  types.  We  investigated  the  properties  of  several  fuzzy  set  theoretic 
aggregation  operators  from  the  point  of  view  of  flexibility  and  trainability.  We  found  that  fuzzy  set 
theory  provides  a  host  of  very  attractive  aggregation  connectives  for  integrating  membership  values 
representing  uncertain  and  subjective  information.  These  connectives  can  be  categorized  into  the 
following  three  classes  based  on  their  aggregation  behavior;  i)  intersection  (conjunctive) 
connectives,  ii)  union  (disjunctive)  connectives,  and  iii)  compensative  connectives.  The 
intersection  connective  has  the  property  that  the  aggregated  value  is  high  only  when  all  of  the 
inputs  are  high.  The  union  connective  has  the  property  that  the  aggregated  value  is  high  whenever 
any  one  of  the  input  values  representing  different  features  or  criteria  is  high.  Compensative 
connectives  are  used  when  one  might  be  witling  to  sacrifice  a  little  on  one  factor,  provided  the  loss 
is  compensated  by  gain  in  another  factor.  Compensative  connectives  can  be  further  classified  into 
mean  (averaging)  operators  and  hybrid  operators.  Mean  operators  are  defined  through  an  axiomatic 
approach.  They  are  monotonic  operators  that  satisfy  the  condition:  min(a,^)  <  mean(fl,ft)  < 
max(a,h).  Hybrid  operators  arc  defined  as  the  weighted  arithmetic  or  geometric  mean  of  a  pair  of 
conventional  union  and  intersection  operators. 

All  these  connectives  are  very  powerful  and  flexible  in  that  by  choosing  correct  parameters, 
one  can  not  only  control  the  nature  (e.  g.  conjunctive,  disjunctive  and  compensative),  but  also  the 
attitude  (e.  g.  pessimistic  and  optimistic)  of  the  aggregation.  Our  investigations  indicate  that 
Yager’s  union  and  intersection  operators,  the  generalized  mean,  and  the  y^model  are  particularly 
suited  to  model  naost  of  the  aggregation  behaviors  that  may  be  encountered  in  computer  vision  and 
artificial  intelligence.  We  conducted  thorough  theoretical  study  of  the  properties  of  these  operators. 
Details  of  the  new  properties  discovered  along  with  their  proofs  and  graphical  depictions  may  be 
found  in  [1,6,7,12,13],  Many  of  these  properties  are  very  interesting  and  useful  from  the  point  of 
uncertainty  management. 
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3.  Investigation  of  Training  Methods  to  Determine  Nature  and  Structure  of 
Aggregation  Networks 

There  are  several  ways  to  model  the  uncertainty  management  problem.  In  our  first 
approach,  we  formulate  the  uncertainty  management  problem  as  a  multi-criteria  decision  making 
problem  as  follows.  The  support  for  a  decision  may  depend  on  supports  for  (or  degrees  of 
satisfaction  of)  several  different  criteria,  and  the  degree  of  satisfaction  of  each  criterion  may  in  turn 
depend  on  degrees  of  satisfaction  of  other  sub-criteria,  and  so  on.  Thus,  the  decision  process  can 
be  viewed  as  a  hierarchical  network,  where  each  node  in  the  network  "aggregates"  the  degree  of 
satisfaction  of  a  particular  criterion  from  the  observed  support.  The  inputs  to  each  node  are  the 
degrees  of  satisfaction  of  each  of  the  sub-criteria,  and  the  output  is  the  aggregated  degree  of 
satisfaction  of  the  criterion.  Thus,  the  decision  making  problem  reduces  to  i)  selecting  suitable 
features  (criteria)  for  the  problem  on  hand,  ii)  finding  ways  to  compute  the  input  supports  (degrees 
of  satisfaction  of  criteria)  based  on  values  of  features  (criteria)  selected,  and  iii)  determining  the 
structure  of  the  network  and  the  nature  of  the  connectives  at  each  node  of  the  network  using 
training  procedures. 

We  have  explored  several  training  methods  based  on  constrained  gradient  descent 
[l,5,6,14].We  have  suggested  methods  for  eliminating  the  constraints  so  that  they  can  be  viewed 
as  simple  minimization  problems.  Six  algorithms  have  been  developed  and  they  may  be  found  in 
[1].  Three  of  the  algorithms  are  meant  for  single-level  aggregation  and  the  other  three  are  for  multi¬ 
level  aggregation.  Methods  for  speeding  up  the  training  procedures  are  also  discussed.  The 
convergence  properties  of  the  training  algorithms  have  also  been  investigated.  We  showed  that  the 
algorithms  converge  to  a  unique  (global  minimum)  solution  under  some  conditions,  which  is  a 
highly  desirable  result,  particularly  for  gradient  descent  methods  [1,5,6].  Our  training  methods 
determine  not  only  the  nature  (disjunctive,  conjunctive,  compensative)  of  the  connective  to  be  used 
at  each  node,  but  also  the  values  of  the  parameters  (relative  importance  of  criteria)  at  each  node  and 
the  attitude  (pessimistic,  optimistic)  of  each  node. 
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Gradient  descent  methods  have  the  disadvantage  that  they  may  sometimes  converge  to  a 
local  minimum.  They  can  also  be  rather  slow.  Therefore,  we  investigated  alternative  methods  to 
perform  the  training  to  midgate  these  problems.  Our  experiments  with  random  search  methods 
indicate  that  they  are  superior  to  gradient  descent  methods  when  the  networks  are  small  For  large 
networks,  random  search  methods  require  too  much  memory,  and  their  speed  also  goes  down 
considerably,  unless  efficient  ways  to  prune  the  search  space  are  found. 

4.  Elimination  of  Redundant  Features/Criteria  and  Interpretation  of  Networks 

One  serendipitous  result  that  emerged  from  oiu*  training  technique  is  that  it  is  also  capable 
of  detecting  redundant  features  (or  criteria).  Because  of  this  property,  we  can  use  our  training 
methods  even  though  the  structure  of  the  network  and  the  criteria  to  be  used  are  only  approximately 
known.  We  defined  three  types  of  redundancies.  These  correspond  to  uninformative,  unreliable 
and  superfluous  criteria.  Uninformative  criteria  are  those  criteria  whose  degrees  of  satisfaction  are 
always  approximately  the  same,  regardless  of  the  situation.  Therefore,  these  criteria  do  not  provide 
any  information  about  the  situation,  thus  contributing  little  to  the  decision-making  process. 
Unreliable  criteria  correspond  to  criteria  whose  degrees  of  satisfaction  do  not  affect  the  final 
decision.  In  other  words,  the  final  decision  is  the  same  for  a  wide  range  of  degrees  of  satisfaction. 
Superfluous  criteria  are  criteria  which  are  strictly  speaking  not  required  to  make  the  decision.  The 
decisions  made  without  considering  such  criteria  may  be  as  accurate  or  as  reliable.  However 
superfluous  criteria  may  be  used  to  make  the  decision-making  process  more  robust. 

A  connection  is  considered  redundant  if  the  weight  associated  with  it  gradually  approaches 
zero  (or  less  than  a  small  threshold  value)  as  the  learning  proceeds.  This  will  happen  if  a  particular 
criterion  is  relatively  unimportant  in  making  the  decision.  A  node  (associated  with  a  criterion)  is 
considered  redundant  if  all  the  connections  from  the  output  of  this  node  to  other  nodes  become 
redundant.  In  general,  we  found  that  our  training  procedure  is  capable  of  detecting  redundancies 
corresponding  to  uninformative  and  unreliable  criteria.  Our  simulations  show  that  in  b^th  cases. 
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the  weights  corresponding  to  all  the  output  connections  of  such  nodes  go  to  zero.  Therefore  such 
nodes  (criteria)  arc  eliminated  from  the  structure.  Several  examples  of  redundancy  detection  arc 
given  in  [1,5,6,14].  We  believe  that  this  is  one  of  the  most  powerful  aspects  of  our  approach. 
Another  attractive  feature  of  our  approach  is  that  the  resulting  networks  can  be  interpreted  as  a  set 
of  "rules"  that  may  be  used  for  decision  making.  In  other  words,  the  proposed  method  captures  an 
abstract  model  of  the  decision-making  process. 

5.  Methods  for  Calculating  Memberships  Based  on  Observed  Feature  Values 

One  frequent  criticism  of  fuzzy-set-theorctic  methods  is  that  the  membership  functions  are 
difficult  to  compute.  However,  we  developed  several  methods  to  construct  membership  functions 
from  of  representative  training  data  and  showed  that  they  arc  effective  for  computer  vision 
applications. 

Fuzzy  Clustering  Methods:  Let  K  denote  the  number  of  alternative  decisions  for  which  we  wish  to 
compute  the  relative  suppons  (memberships)  from  the  features.  The  observed  feature  values  can  be 
grouped  into  K  clusters  using  a  fuzzy  clustering  algorithm.  The  »th  fuzzy  cluster  center  is  taken  to 
be  the  value  of  the  feature  that  best  supports  the  ith  decision  if  this  feature  value  is  used  by  itself  as 
a  criterion.  These  cluster  centers  are  then  be  used  to  compute  the  membership  value  (degree  of 
satisfaction)  for  each  criterion  for  a  test  feature.  The  fuzzy  /f-means  algorithm  is  straightforward 
and  has  been  successfully  used  in  our  research  as  well  as  that  of  others  in  in  computer  vision. 
However,  like  all  clustering  algorithms,  it  can  be  slow,  and  will  only  work  if  the  features  exhibit 
nattiral  groupings  with  respect  to  the  distance  measure  chosen.  The  simple  fuzzy  K-means 
algorithm  is  effective  only  when  the  clusters  are  approximately  hyperspherical  and  roughly  of  equal 
size.  Several  other  fuzzy  clustering  algorithms  exist,  and  the  choice  of  the  algorithm  depends  on 
the  nature  of  the  clusters  one  expects  in  a  problem.  In  this  research  effort,  we  developed  a 
Compatible  Cluster  Merging  fCCM)  algorithm  that  is  specially  designed  to  find  the  optimum 
number  of  linear,  planar  and  hyperplanar  clusters  (i.  e.,  clusters  that  lie  in  subspaces  of  the  original 
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space),  'vhen  an  upper  bound  c>n  the  number  of  clusters  present  is  known.  This  algorithm  was 
sb'  *71  to  be  superior  to  more  traditional  validity-measure-based  approaches.  The  effectiveness  and 
advantages  of  the  proposed  technique  in  2-D  and  3-D  applications  has  been  demonstrated  with  both 
synthetic  and  real  data.  The  proposed  algorithm  can  be  used  not  only  for  computing  membership 
values,  but  also  for  character  recognition,  obtaining  straight-line  descriptions  of  intensity  edge 
images  and  obtaining  planar  approximations  of  3-D  (range)  data.  More  details  may  be  found  in 
[8,18]. 

Histogram-Based  Membership  Distributions:  In  this  method,  the  normalized  histogram  from  the 
training  data  is  treated  as  a  possibility  distribution  and  the  membership  in  each  class  for  a  particular 
feature  value  is  then  directly  calculated  using  these  possibility  functions.  One  advantage  of  this 
approach  is  that  the  membership  values  are  absolute,  i.  e.,  the  membership  value  in  one  class  does 
not  depend  on  the  membership  values  in  the  other  classes.  Therefore,  addition  of  new  classes  to 
the  problem  can  be  handled  easily.  We  successfully  used  this  approach  for  membership  calculation 
for  hierarchical  aggregation  of  TV  and  FLIR  data  for  the  classification  of  tanks  and  armored 
personnel  carriers  [1,6,17].  So  long  as  the  distribution  of  feature  values  for  "he  training  set  is  fairly 
characteristic  of  the  entire  ensemble,  this  approach  will  produce  good  membership  values  to  be 
used  in  the  aggregation  scheme.  While  the  typicality  of  the  training  samples  is  a  concern  in  any 
pattern  recognition  algorithm,  the  pxrwer  of  the  proposed  decision-making  scheme  lies  in  the  fact 
that  the  aggregation  operators  can  be  tailored  to  be  insensitive  to  small  variations  in  memberships, 
and  in  the  fact  that  they  are  inherently  compensatory,  so  that  complementary  and  superfluous 
information  can  be  used  to  overcome  a  faulty  membership  assignment  in  one  feature. 

Heuristic  Methods:  In  this  approach,  the  possibility  functions  are  assumed  to  be  Gaussian  or 
trapezoidal  or  any  other  suitable  shape.  The  membership  values  will  be  calculated  using  these 
assumed  functions.  The  advantage  of  this  method  is  that  it  is  relatively  insensitive  to  training  data. 
This  is  a  good  method  to  use  for  features  such  as  "position",  "range",  "speed",  etc.  We  have 
successfully  utilized  this  technique  in  a  multi-(color)sensor  fusion  problem  [1,6,19]. 


6.  Experiments  with  Aggregation  Networks 


Extensive  testing  of  the  proposed  techniques  were  conducted  on  both  synthetic  and  real  data. 

Experiments  on  synthetic  data:  Several  examples  of  results  obtained  with  synthetic  data  are  given 
in  [1,5,61.  These  include  several  single-level  aggregation  problems  and  multi-level  aggregation 
problems  (including  the  exclusive  OR  problem  and  the  “stool”  problems).  In  almost  every  case,  the 
outputs  produced  by  the  networks  matched  the  desired  values  very  closely.  Also,  our  training 
algorithms  were  able  to  perform  the  redundancy  detection  discussed  in  Section  4.  The  results  of  the 
two-class  problem  indicate  that  the  performance  of  our  method  is  as  good  as  the  Bayesian  methods 
on  data  that  is  specifically  designed  for  the  Bayes  method.  In  addition,  our  method  is  able  to  pick 
strong  features  and  discard  weak  ones  with  no  loss  in  performance. 

Experiments  on  real  data:  Experimental  results  on  a  wide  variety  of  problems  ranging  from 
evaluation  of  creditworthiness  to  segmentation  and  labeling  of  outdoor  color  scenes  are  described 
in  detail  in  [1,5,6,17,19].  Our  solution  to  the  creditworthiness  problem  proved  to  be  superior  to 
the  empirical  methods  used  by  Zimmermann  [14]. 

Experiments  on  multi-modality  information  fusion  were  conducted  using  TV  (intensity)  and 
FLIR  (forward  looking  infra-red)  data  to  classify  tanks  and  armored  personnel  carriers  with 
promising  results.  We  experimented  with  a  variety  of  aggregation  networks,  single-level,  multiple- 
level  where  the  information  from  one  modality  is  combined  first,  and  multiple-level  where  the 
information  from  the  same  type  of  feature  are  combined  first  These  experiments  are  explained  in 
[1,6,17].  A  comparison  of  our  results  with  those  obtained  by  probabilistic  and  belief-theoretic 
methods  indicates  that  our  methods  arc  superior.  Results  of  our  experiments  on  fusing  information 
from  three  color  channels  are  also  given  [6,19],  We  obtained  the  color  images  from  the  University 
of  Massachusetts  (Hanson,  Riseman  et  al).  It  is  to  be  noted  that  we  perform  segmentation  and 
labeling  simultaneously.  The  results  are  excellent,  considering  the  complexity  of  the  problem. 
Again  several  types  of  aggregation  networks  were  considered.  One  important  result  of  these 


experiments  are  that  the  networks  that  result  (after  the  redundancy  detection  process  is  complete) 
can  be  interpreted  as  a  set  of  rules.  Thus,  our  training  methods  may  be  used  to  generate  decision 
rules. 

7.  Theoretical  Investigation  on  the  Generalization  of  the  Fuzzy  Integral 

The  fuzzy  integral  is  another  numeric-based  approach  which  we  have  used  for  both 
segmentation  and  object  recognition.  It  also  uses  a  hierarchical  network  of  evidence  sources  to 
arrive  at  a  confidence  value  for  a  particular  hypothesis  or  decision.  The  difference  from  the 
proceeding  method  is  that  besides  this  directiy  supplied  objective  evidence,  the  fuzzy  integral 
utilizes  information  concerning  the  worth  or  importance  of  the  sources  in  the  decision  making 
process.  The  fuzzy  integral  relies  on  the  concept  of  a  fuzzy  measure  which  generalizes  probability 
measure  in  that  it  does  not  require  additivity,  replacing  it  with  a  weaker  continuity  condition.  The 
fuzzy  integral  is  interpreted  as  an  evaluation  of  object  classes  where  the  subjectivity  is  embedded  in 
the  fiizzy  measure.  In  our  applications,  the  integral  is  evaluated  over  a  set  of  information  sources 
(sensors,  algorithms,  features,  etc.)  and  the  function  being  integrated  supplies  a  confidence  value 
for  a  particular  hypothesis  or  class  from  the  standpoint  of  each  individual  source  of  information.  In 
comparison  with  probability  theory,  the  fuzzy  integral  corresponds  to  the  concept  of  expectation. 
The  fuzzy  integral  values  provide  a  different  measure  of  certainty  in  the  classification  than  posterior 
probabilities.  Since  the  integral  evaluation  need  not  sum  to  one,  lack  of  evidence  and  negative 
evidence  can  be  distinguished. 

The  Fuzzy  Integral  is  a  flexible,  nonlinear  information  fusion  methodology  which 
combines  objective  evidence  (membership  values)  with  expected  values  of  subsets  of  the  sources 
(as  embodied  in  a  fuzzy  measure).  A  particularly  useful  set  of  fuzzy  measures  is  due  to  Sugeno. 
Our  initial  investigations  focussed  on  the  Sugeno  integral  for  several  reasons.  First,  Sugeno 
measures  posses  recursive  generation  properties,  allowing  for  efficient  implementation.  By  the 
structure  of  the  fuzzy  integral,  it  is  only  necessary  to  compute  the  measure  on  n  subsets,  instead  of 


2”  subsets,  for  each  computation.  Also,  all  Sugeno  measures  are  either  belief  functions  or 
plausibility  functions,  in  the  sense  of  Dempster-Shafer.  Thus,  fiizzy  integrals  using  Sugeno 
measures  provide  a  mechanism  for  joining  fuzzy  set  theory  and  belief  theory.  References [5,11] 
describe  our  investigations  with  these  techniques. 

We  have  extended  the  theoretical  utility  of  the  Sugeno  fuzzy  integral  for  information  fusion. 
There  were  two  extensions  to  this  basic  approach.  First,  we  replaced  the  Sugeno  measmes  by  a 
more  general  class  of  fuzzy  measures,  the  so-called  S-decomposable  measures.  Besides  the  value 
of  generalization  to  a  wider  family  of  fuzzy  integrals,  many  of  these  measures  have  generation 
formulae  which  require  fewer  computations  than  do  the  Sugeno  measures.  The  algorithm  using 
these  measures  of  a  fuzzy  integral  pattern  recognition  problem  is  identical  to  that  for  the  Sugeno 
measures;  the  only  difference  being  the  method  for  generating  the  measures  during  each  evaluation. 
In  fact,  even  the  training  methods,  ie,  learning  the  fuzzy  densities  from  labeled  training  data,  are 
identical  to  those  we  developed  for  Sugeno  measures.  Only  the  calculation  of  the  measure  of  an 
arbitrary  subset  change.  Results  in  [2,5,15,22,23]  show  the  extent  of  our  research  into  the  theory, 
training,  and  utilization  of  the  fuzzy  integral  in  computer  vision  applications. 

The  second  extension  to  the  standard  fuzzy  integral  involves  the  defiaition  of  the  fuzzy 
integral  itself.  The  original  fuzzy  integral  produced  a  “best  pessimistic”  combination  of  the 
objective  evidence  from  knowledge  sources  and  the  worth  (importance)  of  subsets  of  those 
sources.  This  is  achieved  by  using  the  Maximum  of  a  set  of  Minimums.  By  changing  the 
Maximum  to  one  of  a  selected  class  of  co-t-noims,  or  by  replacing  the  Minimum  by  one  of  a  class 
of  t-norms,  many  different  algorithms  for  evidence  combination  can  be  obtained.  These  can  either 
be  more  oi  less  optimistic  than  the  original  fuzzy  integral,  giving  the  designer  more  flexibility  in  the 
design  of  an  algorithm  (or  suite  of  algorithms)  for  a  particular  problem. 
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8.  Methods  for  Calculating  Densities  Based  on  Observed  Feature  Values 

The  generation  of  fuzzy  density  values  is  crucial  to  the  success  of  the  fuzzy  integral.  We 
have  developed  methods  to  generate  these  density  values  from  the  histograms  of  the  training  data. 
These  methods  are  based  on  how  well  separated  the  histograms  of  the  same  feature  are  for  different 
classes.  Our  new  method  has  a  strong  mathematical  justification,  and  in  addition,  produces 
intuitively  pleasing  results  [15].  In  addition,  for  the  fuzzy  integral  with  respect  to  a  possibility 
measure  (a  particularly  nice  measure  from  the  calculational  standpoint),  we  have  introduced  a 
density  training  method  which  performs  a  search  in  density-space  for  the  “best”  set  of  densities 
with  respect  a  training  set  [16,24].  This  is  only  possible  because  of  the  simple  measure  generation 
scheme.  The  details  of  the  proposed  methods  can  be  found  in  [16,24]. 

9.  Experiments  with  the  fuzzy  integral 

In  [2,5,  11,22,23],  we  developed  several  theoretical  properties  of  these  integrals,  and 
examine  their  behavior  on  both  simulated  data,  and  data  from  multi-sensor  Automatic  Target 
Recognition  (FLIR  and  TV).  Almost  all  the  ATR  experiments  that  were  used  to  test  the  aggregation 
network  approach  were  repeated  with  the  fuzzy  integral  approach.  The  generalized  fuzzy  integral  in 
the  problem  of  fusing  information  from  FLIR  (Forward  Looking  Infrared)  and  TV  data  to  classify 
tanks  and  armored  persotuiel  carriers  from  statistical  and  texture  features  in  one-,  two-  and  three- 
level  networks  produced  excellent  results[l  1,22,23].  Comparison  of  our  results  with  probabilistic 
and  belief-theoretic  methods  indicates  that  our  methods  are  superior.  The  frnal  crisp  partition  is 
equivalent  to  that  of  aggregation  networks,  but  the  problem  formalisms,  the  training  procedures, 
and  the  interpretation  of  the  results  differ. 

10.  Evidence  aggregation  networks  for  fuzzy  logic  inference 

Fuzzy  logic  has  recently  gained  considerable  attention.  This  technology  has  been 
successfully  applied  to  numerous  problems,  mostly  in  the  control  area,  where  the  complexity  of  the 


system  tends  to  preclude  an  analytic  solution.  However,  it  is  equally  powerful  in  pattern 
recognition  and  multicriteria  decision  making  environments.  Fuzzy  logic  works  well  in  those  cases 
where  the  important  decision  making  criteria  can  be  expressed  in  terms  of  commonsense, 
linguistically-stated  rules.  The  uncertainty  in  the  rules  is  modeled  numerically  by  fuzzy  sets 
representing  the  meanings  of  the  antecedent  and  consequent  clauses.  Once  the  rules  have  been 
modeled,  an  inference  procedure  is  necessary  to  derive  conclusions  from  uncertain  conditions. 
Unlike  predicate  calculus  which  offers  traditional  modus  ponens,  fuzzy  logic  presents  a  multitude 
of  inference  mechanisms. 

Since  all  rules  (with  common  antecedent  clause  variables)  can  "fire"  simultaneously  in  a 
fuzzy  logic  system,  the  computational  load  is  considerable.  Special  purpose  chips  have  been 
designed  and  built  to  perform  particular  versions  of  inference.  Neural  network  structures  also 
provide  means  of  parallel  and  flexible  computations.  We  introduced  two  network  structures  to 
perform  fuzzy  logic  inference.  The  first  type  was  a  hand-crafted  network  which  possessed 
desirable  theoretical  properties,  whereas  the  second  was  a  standard  multilayer  perceptron  which 
has  been  shown  to  be  capable  of  learning  complex  linguistic  relationships  between  the  antecedent 
and  consequent  of  fuzzy  logic  rules. 

Trainable  evidence  aggregation  networks  utilizing  families  of  fuzzy  set  theoretic 
connectives  have  been  introduced  by  us  (Section  2)  [6,7].  Since  the  process  of  fuzzy  inference 
concerns  the  fusion  of  evidence  -  how  much  does  the  input  possibility  match  that  of  the  antecedent 
-  it  is  only  natural  that  these  highly  flexible  and  powerful  network  structures  be  utilized  in  the 
inference  mechanism.  Our  latest  activity  in  this  area  is  to  combine  methodologies  to  generalize  and, 
at  the  same  time  simplify  the  handcrafted  networics  for  fuzzy  logic  inference.  FurthermOTe,  because 
of  the  generalization  to  a  parametrically  defined  family  of  aggregation  operators,  a  learning 
algorithm  was  implemented  which  allowed  the  networks  to  outdo  the  theoretically  predicted 
performance.  In  [9]  we  proposed  a  fixed  network  architectiuc  employing  general  fuzzy  unions  and 
intersections  as  a  mechanism  to  implement  fuzzy  logic  inference.  It  was  shown  that  these  networks 
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possess  desirable  theoretical  properties.  Networks  based  on  parametrized  families  of  operators 
(such  as  Yager’s  union  and  intersection)  have  extra  predictable  properties  and  admit  a  training 
algorithm  which  produces  sharper  inference  results  than  were  earlier  obtained.  Simulation  studies 
were  presented  which  conoborate  the  theoretical  properties. 

10.  Range  and  intensity  information  fusion  using  morphological  methods 

Range  images  provide  an  explicit  encoding  of  the  shape  and  the  geometric  structure  of  the 
objects  in  the  image  from  the  point  of  view  of  the  sensor.  Since  morphological  methods  are 
inherently  geometric  in  nature,  they  are  ideally  suited  for  the  analysis  of  range  images.  However, 
morphological  edge  operators  meant  for  intensity  images  cannot  be  used  to  detect  edges  in  range 
images,  because  roof  and  crease  edges  do  not  correspond  to  depth  discontinuities.  We  developed, 
two  edge  detection  and  classification  schemes  for  range  images  based  on  morphological 
operations.  The  first  method  uses  the  residues  of  openings  and  closings  to  detect  roof  and  crease 
edges.  Directional  sensitivity  to  edges  is  incorporated  by  using  struemring  elements  oriented  in 
different  directions.  The  second  method  employs  the  residues  of  dilation  and  erosion  at  multiple 
scales  and  provides  a  richer  description  of  the  surface  structure  at  each  point  in  the  image  by 
classifying  each  pixel  as  belonging  to  eight  possible  structure  types:  positive  roof,  negative  roof, 
positive  crease,  negative  crease,  top  of  step,  base  of  step,  ramp,  and  constant  surface. 

Morphological  methods  have  the  advantage  of  simplicity,  speed  and  parallelism.  The 
methods  we  have  developed  can  be  used  for  range  edge  detection,  edge  and  surface 
characterization,  segmentation  of  range  images,  and  determination  of  hold  sites  in  robotic 
applications.  Several  applications,  including  the  fusion  of  edge  information  from  registered 
range/intensity  images,  are  described  in  [4,10,20,23]. 

Edges  in  intensity  images  occur  due  to  changes  in  illumination  and  surface  reflectance.  This 
may  or  may  not  reflect  changes  in  the  geometry  of  the  object.  Range  images  on  the  other  hand  will 


contain  edges  solely  due  to  the  changes  in  physical  shape  and  structure  of  the  object  If  registered 
intensity/range  image  pairs  are  available,  then  we  can  isolate  1)  edges  due  to  the  geometric  structure  of 
the  object,  and  2)  edges  due  to  changes  in  the  illuminadon  and  surface  reflectance.  The  utility  of  such 
an  approach  can  be  seen  (for  example)  in  an  application  that  requires  that  packets  be  picked  up  by  a 
robot,  and  also  that  characters  printed  on  the  packages  be  read.  Using  intensity  images  alone,  the 
changes  in  reflectance  due  to  the  contrast  of  the  lettering  will  give  rise  to  edges  which  are  not  hold  sites 
for  a  robot  arm.  The  range  image,  on  the  other  hand  can  be  used  to  determine  such  hold  sites  safely. 
Further,  by  removing  these  geometric  edges  from  the  intensity  edge  map,  we  can  locate  the  lettering. 

From  a  registered  image  pair  denoted  as  REF  (for  reflectance  image)  and  RAN  (for  range 
image),  we  obtain  the  edge  maps,  /?FFedge.  and  /?/4iVedge-  The  problem  is  then  to  obtain  1)  edges 
common  to  REFtdge  and  RAN^gQ,  2)  edges  only  in  RAN^^gQ  (geometric  edges)  and,  3)  edges  only  in 
REFoigt  (non-geometric  edges).  In  order  to  locate  all  edges  that  are  common  with  /MiVedge  image,  we 
look  in  a  say  2n+lx2n+l  neighborhood  of  an  edge  pixel  in  /?AA^edge-  If  we  find  an  edge  pixel  in  the 
REFedgc  image,  then  we  mark  that  pixel  as  a  common  edge,  i.e.,  the  pixel  appears  in  the  COMedge 
image  i.  e.,  the  edge  image  consisting  of  edges  common  to  /JAA^edge  and  /?EFedge-  The  process  of 
examining  a  2n+lx2n+l  neighborhood  of  edge  pixels  /2A/Vedge  can  be  accomplished  by  dilating  the 
(binary)  edge  map  F/lJVedge  by  a  2n+l  x  2n+l  square  structuring  element.  The  resulting  dilated 
RANcdge  is  then  ANDed  with  FFFedge- 

The  difference  of  CO  A/edge  and  REF  edge  will  give  us  the  non-geometric  edges 
{NON  GEOM edge)  while  the  difference  of  COMedge  with  RANegde  will  give  us  geometric  edges 
(GEOMedge)  not  found  in  REF edge  (usually  roofs,  creases,  and  low  threshold  jumps,  since  these  may 
not  be  detected  in  the  intensity  images).  How  the  edge  information  contained  in  RANegde^  REF edge^ 
COMedge^  GFOMedge.  and  NON  GEOMedge.  are  u.sed,  depends  upon  the  application. 
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